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Abstract 

We consider a classical finite horizon optimal control problem for continuous-time pure 
jump Markov processes described by means of a rate transition measure depending on a 
control parameter and controlled by a feedback law. For this class of problems the value 
function can often be described as the unique solution to the corresponding Hamilton- Jacobi- 
Bellman equation. We prove a probabilistic representation for the value function, known as 
nonlinear Feynman-Kac formula. It relates the value function with a backward stochastic 
differential equation (BSDE) driven by a random measure and with a sign constraint on its 
martingale part. We also prove existence and uniqueness results for this class of constrained 
BSDEs. The connection of the control problem with the constrained BSDE uses a control 
randomization method recently developed in the works of I. Kharroubi and H. Pham and 
their co-authors. This approach also allows to prove that the value function of the original 
non-dominated control problem coincides with the value function of an auxiliary dominated 
control problem, expressed in terms of equivalent changes of probability measures. 


1 Introduction 

The main aim of this paper is to prove that the value function in a classical optimal control 
problem for pure jump Markov processes can be represented by means of an appropriate Backward 
Stochastic Differential Equation (BSDE) that we introduce and for which we prove an existence 
and uniqueness result. 

We start by describing our setting in an informal way. A pure jump Markov process X in a 
general measurable state space (E,£) can be described by means of a rate transition measure, or 
intensity measure, v(t,x,B) defined for t > 0, x € E, B G £. The process starts at time t > 0 
from some initial point x € E and stays there up to a random time Tj such that 

v{r , x , E) dr^j , s >t. 

At time Tj, the process jumps to a new point chosen with probability v(Ti,x, -)/v(T\,x,E) 
(conditionally to Tj) and then it stays again at X^ up to another random time Tj such that 

P(T 2 > s | Tj, X Tl ) = exp j v(r,X Tll E)dr S j , s > Tj, 

and so on. 

A controlled pure jump Markov process is obtained starting from a rate measure A (x,a,B) 
defined for x£E, a€ A, Bg£, i.e., depending on a control parameter a taking values in 


P(Tj > s) = exp ( — 
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a measurable space of control actions (A, A). A natural way to control a Markov process is to 
choose a feedback control law, which is a measurable function a : [0, oo) x E —>• A. a(t, x) £ A is 
the control action selected at time t if the system is in state x. The controlled Markov process X 
is simply the one corresponding to the rate transition measure A(x, a(t, x), B). Let us denote by 
the corresponding law, where t, x are the initial time and starting point. 

We note that an alternative construction of (controlled or uncontrolled) Markov processes 
consists in defining them as solutions to stochastic equations driven by some noise (for instance, 
by a Poisson process) and with appropriate coefficients depending on a control process. In the 
context of pure jump processes, our approach based on the introduction of the controlled rate 
measure A (x,a,B) often leads to more general results and it is more natural in several contexts. 

In the classical finite horizon control problem one seeks to maximize over all control laws a. a 
functional of the form 


J(t, x , a) = 


f(s,X s ,a(s,X s ))ds + g(X T ) 


Ut 


( 1 . 1 ) 


where a deterministic finite horizon T > 0 is given and f,g are given real functions, defined on 
[0, T] x E x A and E. representing the running cost and the terminal cost, respectively. The value 
function of the control problem is defined in the usual way: 


v(t, x) = sup J(t, x, a), t € [0,T], x £ E. (1.2) 

a 

We will only consider the case when the controlled rate measure A and the costs f,g are 
bounded. Then, under some technical assumptions, v is known to be the unique solution on 
[0, T] x E to the Hamilton-Jacobi-Bellman (HJB) equation 

Qv f f 

-^(L^) = SU P / - v{t,x))X(x,a,dy) + f{t,x,a) 

ot aeA \Je 

v(T,x) = g(x), 

and if the supremum is attained at some a(t, x) € A depending measurably on (t, x) then a is an 
optimal feedback law. Note that the right-hand side of (11.31) is an integral operator: this allows 
for easy notions of solutions to the HJB equation, that do not in particular need the use of the 
theory of viscosity solutions. 

Our purpose is to relate the value function v(t,x) to an appropriate BSDE. We wish to 
extend to our framework the theory developed in the context of classical optimal control for 
diffusion processes, constructed as solutions to stochastic differential equations of Ito type driven 
by Browian motion, where representation formulae for the solution to the HJB equation exist 
and are often called non-linear Feyman-Kac formulae. The majority of those results requires that 
only the drift coefficient of the stochastic equation depends on the control parameter, so that 
in this case the HJB equation is a second-order semi-linear partial differential equation and the 
non-linear Feyman-Kac formula is well known, see e.g. [14j . Generally, in this case the laws of 
the corresponding controlled processes are all absolutely continuous with respect to the law of a 
given, uncontrolled process, so that they form a dominated model. 

A natural extension to our framework could be obtained imposing conditions implying that 
the set of probability laws {Pa X } a , when a varies over all feedback laws, is a dominated model. 
This is the point of view taken in [8], where an appropriate BSDE is introduced and solved and 
a Feyman-Kac formula for the value function is proved in a restricted framework. Extensions are 
given in [I] to controlled semi-Markov processes and in [7] to more general non-Markovian cases. 

In the present paper we want to consider the general case when {Pa X } a is not a dominated 
model. Even for finite state space E, by a proper choice of the measure A (x,a,B) it is easy to 
formulate quite natural control problems for which this is the case. 
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In the context of controlled diffusions, probabilistic formulae for the value function for non- 
dominated models have been discovered only in recent years. We note that in this case the HJB 
equation is a fully non-linear partial differential equation. To our knowledge, there are only a few 
available techniques. One possibility is to use the theory of second-order BSDEs, see for instance 
m, m- Another possibility relies on the use of the theory of G-expectations, see e.g. |2S]. Both 
theories have been largely developed by several authors. In this paper we rather follow another 
approach which is presented in the paper m and was predated by similar results concerning 
optimal switching or optimal irnpuse control problems, see mi. m m, eqi, and followed by 
some extensions and applications, see m, 0 , m- It consists in a control randomization method 
(not to be confused with the use of relaxed controls) which can be described informally as follows, 
in our framework of controlled pure jump Markov processes. 

We note that for any choice of a feedback law a the pair of stochastic processes (X s , a(s, X s )) 
represents the state trajectory and the associated control process. In a first step, for any initial 
time t > 0 and starting point x £ E, we replace it by an (uncontrolled) Markovian pair of pure 
jump stochastic processes (X S ,I S ), possibly constructed on a different probability space, in such 
a way that the process I is a Poisson process with values in the space of control actions A with an 
intensity measure Ao (da) which is arbitrary but finite and with full support. Next we formulate 
an auxiliary optimal control problem where we control the intensity of the process I: for any 
predictable, bounded and positive random field u t (a), by means of a theorem of Girsanov type 
we construct a probability measure P„ under which the compensator of I is the random measure 
i 't(a) Aq (da) dt (under F v the law of X also changes) and then we maximize the functional 




g(X T ) + £ f(s 


Xa , la ) dS 


over all possible choices of the process u. Following the terminology of [21], this will be called the 
dual control problem. Its value function, denoted v*(t, x, a), also depends a priori on the starting 
point a £ A of the process I (in fact we should write instead of F u , but in this discussion we 
drop this dependence for simplicity) and the family {F u } u is a dominated model. As in [21] we 
are able to show that the value functions for the original problem and the dual one are the same: 
v(t, x) = v*(t, x , a), so that the latter does not in fact depend on a. In particular we have replaced 
the original control problem by a dual one that corresponds to a dominated model and has the 
same value function. Moreover, we can introduce a well-posed BSDE that represents v*(t,x,a) 
(and hence v(t,x)). It is an equation on the time interval [t,T] of the form 


Y s = 


g(X T ) + J^ f(r,X r ,I r ) dr + 

— f f Z r (y,b)q(drdydb)— f I Z r (X r , b) \o(db) dr, 
Js JExA Js Ja 


Kt - I<s 
rT 


(1.4) 


with unknown triple ( Y,Z,K ) (depending also on (t,x,a)), where q is the compensated random 
measure associated to (X,I), Z is a predictable random field and K a predictable increasing 
cadlag process, where we additionally add the sign constraint 


Z s (X s _ 1 b)^0. (1.5) 

It turns out that this equation has a unique minimal solution, in an appropriate sense, and that 
the value of the process Y at the initial time represents both the original and the dual value 
function: 

Y t = v(t, x) = v*(t, x, a). (1.6) 
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This is the desired BSDE representation of the value function for the original control problem 
and a Feyman-Kac formula for the general HJB equation (11.31) . 

The paper is organized as follows. Section [2] is essentially devoted to lay down a setting where 
the classical optimal control problem (11.21) is solved by means of the corresponding HJB equation 
m- We first recall the general construction of a Markov process given its rate transition measure. 
Having in mind to apply techniques based on BSDEs driven by random measures we need to work 
in a canonical setting and use a specific filtration, see Remark 12.21 Therefore the construction we 
present is based on the well-posedness of the martingale problem for multivariate (marked) point 
processes studied in [18] and it is exposed in detail. This general construction is then used to 
formulate in a precise way the optimal control problem for the jump Markov process and it is used 
again in the subsequent section when we define the pair ( X , I) mentioned above. Still in section 
[2] we present classical results on existence and uniqueness of the solution to the HJB equation 
m and its identification with the value function v. These results are similar to those in [26j . 
a place where we could find a clear and complete exposition of all the basic theory and to which 
we refer for further references and related results. We note that the compactness of the space of 
control actions A, together with suitable upper-semicontinuity conditions of the coefficients of the 
control problem, is one of the standard assumptions needed to ensure the existence of an optimal 
control, which is usually constructed by means of an appropriate measurable selection theorem. 
Since our main aim was only to find a representation formula for the value function we wished 
to avoid the compactness condition. This was made possible by the use of a different measurable 
selection result, that however requires lower-semicontinuity conditions. Although this is not usual 
in the context of maximization problems, this turned out to be the right condition that allows 
to dispense with compactness assumptions and to prove well-posedness of the HJB equation and 
a verification theorem. A small variation of the proofs recovers the classical results in [26] , and 
even with slightly weaker assumptions: see Remark 12. Ill for a more detailed comparison. 

In section [3] we start to develop the control randomization method: we introduce the auxiliary 
process (X, I) and formulate the dual control problem under appropriate conditions. Finding 
the correct formulation required some efforts; in particular we could not mimic the approach of 
previous works on control randomization mentioned above, since we are not dealing with processes 
defined as solutions to stochastic equations. 

In section [4] we introduce the constrained BSDE (|1.4I) - (I1.5I) and we prove, under suitable con¬ 
ditions, that it has a unique minimal solution (Y. Z, K ) in a certain class of processes. Moreover, 
the value of Y at the initial time coincides with the value function of the dual optimal control 
problem. This is the content of the first of our main results, Theorem 14.21 The proof relies on a 
penalization approach and a monotonic passage to the limit, and combines BSDE techniques with 
control-theoretic arguments: for instance, a “penalized” dual control problem is also introduced 
in order to obtain certain uniform upper bounds. In m , in the context of diffusion processes, a 
more general result is proved, in the sense that the generator / may also depend on (Y, Z); similar 
generalizations are possible in our context as well, but they seem less motivated and in any case 
they are not needed for the applications to optimal control. 

Finally, in section [5] we prove the second of our main results, Theorem 15.11 It states that 
the initial value of the process Y in (|1.4|) - (ll.5p coincides with the value function v(t,x). As a 
consequence, the value function is the same for the original optimal control problem and for the 
dual one and we have the non-linear Feynman-Kac formula (|1.6I) . 

The assumptions in Theorem 15.II are fairly general: the state space E and the control action 
space A are Borel spaces, the controlled kernel A is bounded and has the Feller property, and the 
cost functions /, g are continuous and bounded. No compactness assumption is required. When 
E is finite or countable we have the special case of (continuous-time) controlled Markov chains. 
A large class of optimization problems for controlled Markovian queues falls under the scope of 
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our result. 

In recent years there has been much interest in numerical approximation of the value function 
in optimal control of Markov processes, see for instance the book m in the discrete state case. The 
Feynman-Kac formula (11.61) can be used to design algorithms based on numerical approximation 
of the solution to the constrained BSDE (ll.4l) - (ll.5l) . Numerical schemes for this kind of equations 
have been proposed and analyzed in the context of diffusion processes, see [22], |23j. We hope 
that our results may be used as a foundation for similar methods in the context of pure jump 
processes as well. 

2 Pure jump controlled Markov processes 

2.1 The construction of a jump Markov process given the rate transition mea¬ 
sure 

Let £ be a Borel space, i.e., a topological space homeomorphic to a Borel subset of a compact 
metric space (some authors call it a Lusin space); in particular, E could be a Polish space. Let £ 
denote the corresponding Borel c-algebra. 

We will often need to construct a Markov process in E with a given (time dependent) rate 
transition measure, or intensity measure, denoted by u. With this terminology we mean that B H > 
i/(t,x,B) is a nonnegative measure on ( E ,£) for every (t,x) £ [0, oo) x E and (t,x) i->- v{t,x,B) 
is a Borel measurable function on [0, oo) x E for every B £ £. We assume that 

sup i/{t,x,E) < oo. (2-1) 

t> 0, x£E 

We recall the main steps in the construction of the corresponding Markov process. We note 
that (12.11) allows to construct a non-explosive process. Since v depends on time the process will 
not be time-homogeneous in general. Although the existence of such a process is a well known 
fact, we need special care in the choice of the corresponding filtration, since this will be crucial 
when we solve associated BSDEs and implicitly apply a version of the martingale representation 
theorem in the sections that follow: see also Remark 12.21 below. So in the following we will use 
an explicit construction that we are going to describe. Many of the techniques we are going to 
use are borrowed from the theory of multivariate (marked) point processes. We will often follow 
m , but we also refer the reader to the treatise |4j for a more systematic exposition. 

We start by constructing a suitable sample space to describe the jumping mechanism of the 
Markov process. Let il' denote the set of sequences u/ = (t n , e n ) n >\ in ((0, oo) x£)U {(oo, A)}, 
where A ^ E is adjoined to E as an isolated point, satisfying in addition 

tn A t n +l, t n <C OO V t n <C t n - |-i. (2.2) 

To describe the initial condition we will use the measurable space (E,£). Finally, the sample 

space for the Markov process will be II = E x £l'. We define canonical functions T n : Q —> (0, oo], 

E n : ^ E U {A} as follows: writing u = (e,a/) in the form u = (e,ti,e±,t 2 ,e 2 ,...) we set for 

t > 0 and for n > 1 

T n (u) = t n , E n (u) = e n , T^u) = lim t n , T 0 (w) = 0, E 0 (ui) = e. 

n—>-oo 

We also define I : O x [0,oo) -> £U {A} setting X t = l[ 0 ,Ti](i) E o + E„>i 1 (T n ,T n+1 ](t) E n for 
t < Too, X t = A for t > Too. 

In we introduce for all t > 0 the er-algebras Qt = a(N(s,A) : s £ (0,t],A € £), i.e. 
generated by the counting processes defined as N(s,A ) = ^T n <s^-E n eA- 
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To take into account the initial condition we also introduce the filtration F = (Ft)t> o> where 
Fq = 8 ® {0, D'}, and for all t > 0 Ft is the cr-algebra generated by Fq and Qt- F is right- 
continuous and will be called the natural filtration. In the following all concepts of measurability 
for stochastic processes (adaptedness, predictability etc.) refer to F. We denote by Foo the a- 
algebra generated by all the er-algebras Ft- The symbol V denotes the cr-algebra of F-predictable 
subsets of [0, oo) x Q. 

The initial distribution of the process X will be described by a probability measure g on ( E, 8). 
Since Fq = {A x!!' : A E 8} is isomorphic to 8, g will be identified with a probability measure 
on Fq, denoted by the same symbol (by abuse of notation) and such that g(A x II 1 ) = g(A). 

On the filtered sample space (D,F) we have so far introduced the canonical marked point 
process (T n ,E n ) n > i. The corresponding random measure p is, for any ui E fl, a er-finite measure 
on ((0, oo) x E, £>((0, oo)) <S> 8) defined as 

p( w, ds dy) = E u.m <oo fi(T n (u),E n (u))(dsdy), 

n> 1 


where 5k denotes the Dirac measure at point k E (0, oo) x E. 

Now let v denote a time-dependent rate transition measure as before, satisfying (12.11) . We 
need to introduce the corresponding generator and transition semigroup as follows. We denote 
by Bb(E) the space of F-measurable bounded real functions on E and for <j) £ Bb{E) we set 

C t (j){x)= f (4>(y) — (fix)) v(t, x, dy), t>0,x£E. 

Je 


For any T £ (0, oo) and g £ Bf,(E) we consider the Kolmogorov equation on [0, T] x E: 

{ dv 

— (s,x) + JZ s v[s,x) = 0, 
v(T,x) = g(x). 


(2.3) 


It is easily proved that there exists a unique measurable bounded function v : [0, T] x E such that 
v(T, ■) = g on E and, for all x £ E, s e->- v(s, x) is an absolutely continuous map on [0, T] and the 
first equation in (12.31) holds for almost all s £ [0, T] with respect to the Lebesgue measure. To 
verify this we first write (|2.3I) in the equivalent integral form 

f T 

v(s,x)=g(x)+ / C r v(r,x)dr, s £ [0,T], x £ K. 

Then, noting the inequality \£ t (f)(x)\ < 2supj /gS \4>(y)\ su Pte[o,T],j/eE V ^F,F), a solution to the 
latter equation can be obtained by a standard fixed point argument in the space of bounded 
measurable real functions on [0, T] x E endowed with the supremum norm. 

This allows to define the transition operator P s t : Bf,(E) —> Bb(E), for 0 < s < T, letting 
P s t[ y\( x ) = v(s,x), where v is the solution to (12.3j) with terminal condition g £ Bb(E). 

Proposition 2.1. Let (12.11) hold and let us fix t £ [0, oo) and a probability measure g on ( E,8). 

1. There exists a unique probability measure on (fi, F^), denoted by such that its re¬ 
striction to Fq is g and the ¥-compensator (or dual predictable projection) of the mea¬ 
sure p under P t,Ai is the random measure p(ds dy) := 1 [ t Too )(s)is(s, X s _, dy) ds. Moreover, 
P^(T 0O = oo) = l. 

2. In the probability space {D, F^, the process X has distribution g at time t and it is 
Markov on the time interval [t, oo) with respect to F with transition operator P s t: explicitly, 
for every t < s <T and for every g £ Bb{E), 

\g(X T ) | F s ] = P sT [g](X s ), P^ - a.s. 
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Proof. Point 1 follows from a direct application of [18], Theorem 3.6. The non-explosion condition 
P^(Too = oo) = 1 follows from the fact that A is bounded. 

To prove point 2 we denote v(s,x ) = P s T[g\(x ) the solution to the Kolmogorov equation (|2.3D 
and note that 

v{T,X t ) - v(s,X s ) = j ^-(r,X r )dr+ [ [ (v(r,y) - v(r, X r _)) p{dr dy). 

Js or J( S ,T] JE 

This identity is easily proved taking into account that X is constant among jump times and using 
the definition of the random measure p. Recalling the form of the F-compensator p of p under 
we have, P^-a.s., 




l(s 


= EV 

= 

= E t,M 


/ (v(r,y) - v(r,X r J))p(drdy) \ F s 

,T\ JE 

/ / (v(r,y) - v(r, X r _)) p(dr dy) \ F s 

J(s,T] JE 

(v(r, y) — v(r,X r )) v(r,X r , dy) dr \ F s 


(s,T] Je 

C r v{r , X r ) dr \ F s 


L J(*,T] 


and we finally obtain 


E^ \g{X T ) | X s \ - P S T[g](X s ) = E^[v(T,X t ) \ X s ] - v(s,X a ) 


= 


(^r ( r , Xr ^ + i ~' rV ( r '’ Xr ) ) dr \F s 


= 0. 


□ 


In the following we will mainly consider initial distributions p concentrated at some point 
x € E, i.e. p = 5 X . In this case we use the notation P f,3: rather than Note that, P* ,a: -a.s., 

we have Tj > t and therefore X s = x for all s £ [0, t]. 

Remark 2.2. Since the process X is F-adapted, its natural filtration F A = (J- t x )t>o defined by 
= <j(X s : s £ [0, £]) is smaller than F. The inclusion may be strict, and may remain such 
if we consider the corresponding completed filtrations. The reason is that the random variables 
E n and E n+ \ introduced above may coincide on a set of positive probability, for some n, and 
therefore knowledge of a trajectory of X does not allow to reconstruct the trajectory ( T n ,E n ). 

In order to have F s = Ef up to P^-null sets one could require that v(t,x, {x}) = 0, i.e. that 
T n are in fact jump times of X, but this would impose unnecessary restrictions in some constructs 
that follow. 

Clearly, the Markov property with respect to F implies the Markov property with respect to 
F a " as well. 


2.2 Optimal control of pure jump Markov processes 

In this section we formulate and solve an optimal control problem for a Markov process with a 
state space E, which is still assumed to be a Borel space with its Borel cr-algebra £. The other 
data of the problem will be another Borel space A, endowed with its Borel cr-algebra A and called 
the space of control actions; a finite time horizon, i.e. a (deterministic) element T £ (0, oo); two 
real valued functions / and g , defined on [0, T] x E x A and E and called running and terminal 
cost functions respectively; and finally a measure transition kernel A from (E x A, £ (g) *4) to 
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(E,£): namely B H y \(x,a,B) is a nonnegative measure on (E,£) for every (x, a) £ E x A and 
(x,a) eA A (x,a,B) is a Borel measurable function for every B £ £. We assume that A satisfies 
the following condition: 

sup A(x, a, E) < oo. (2-4) 

x£E,a£A 

The requirement that A(x, a, {x}) = 0 for all x £ E and a £ A is natural in many applications, 
but it is not needed. The kernel A depending on the control parameter a £ A plays the role of a 
controlled intensity measure for a controlled Markov process. Roughly speaking, we may control 
the dynamics of the process by changing its jump intensity dynamically. For a more precise 
definition, we first construct 0, F = {Et)t> o, Eoo as in the previous paragraph. Then we introduce 
the class of admissible control laws A ac i as the set of all Borel-measurable maps a : [0, T] x E —> A. 
To any such a we associate the rate transition measure is a (t,x,dy) := A (x,a(t,x),dy). 

For every starting time t £ [0,T] and starting point x £ E, and for each a £ A a d, we 
construct as in the previous paragraph the probability measure on (fi, Eoo), that will be denoted 
Fa^, corresponding to t, to the initial distribution concentrated at x and to the the rate transition 
measure v a . According to Proposition 12.11 under the process X is Markov with respect to F 
and satisfies X s = x for every s £ [0, T]; moreover, the restriction of the measure p to (t, oo) x E 
admits the compensator A(X S _, a(s,X s _), dy)ds. Denoting by Eq X the expectation under Pq X 
we finally define, for t £ [0, T], x £ E and a £ A a d, the gain functional 


J(t, x, a) = E 


t,X 

a 



X s ,a(s,X s ))ds + g(X T ) , 


(2.5) 


and the value function of the control problem 


V(t,x)= sup J(t,x,a). (2.6) 

Since we will assume below that / and g are at least Borel-measurable and bounded, both J and 
V are well defined and bounded. 

Remark 2.3. In this formulation the only control strategies that we consider are control laws of 
feedback type, i.e., the control action a(t,x) at time t only depends on t and on the state x for 
the controlled system at the same time. This is a natural and frequently adopted formulation. 
Different formulations are possible, but usually the corresponding value function is the same and, 
if an optimal control exists, it is of feedback type. 

Remark 2.4. All the results that follows admit natural extensions to slightly more general cases. 
For instance, A might depend on time, or the set of admissible control actions may depend on 
the present state (so admissible control laws should satisfy a(t,x) £ A(x), where A(x) is a given 
subset of A) provided appropriate measurability conditions are satisfied. We limit ourselves to 
the previous setting in order to simplify the notation. 

Let us consider the Hamilton-Jacobi-Bellman equation (for short, HJB equation) related to the 
optimal control problem: this is the following nonlinear integro-differential equation on [0, T] x E: 


d y . . 

~Tt (t ' x) 

= sup (C%v(t, x) + f(t, x, a)), 
asA 

(2.7) 

v(T, x) 

= g{x), 

(2.8) 

where the operator C a E is defined by 



C a E (j){x) 

= / (<t>{y) ~ 00 )) X{x,a,dy) 

Je 

(2.9) 


for all (t, x, a) £ [0, T] x E x A and every bounded Borel-measurable function (f> : E R. 




Definition 2.1. We say that a Borel-measurable bounded function v : [0, T] x E —> R is a solution 
to the HJB equation if the right-hand side of (12.71) is Borel-measurable and, for every (|2.8I) 

holds, the map t 1 ->- v(t,x) is absolutely continuous in [0, T] and (12.71) holds almost everywhere on 
[0, T] (the null set of points where it possibly fails may depend on x). 

In the analysis of the HJB equation and the control problem we will use the following function 
spaces, defined for any metric space S: 

1. Cb(S) = {(j) : S —> R continuous and bounded}, 

2. LSCb(S) = {<j) : S —> R lower semi-continuous and bounded}. 

3. USCb(S) = {(j) : S —>• R upper semi-continuous and bounded}. 

Cb(S ), equipped with the supremum norm ||</>||oo, is a Banach space. LSCb(S) and USCb(S) are 
closed subsets of Cb(S), hence complete metric spaces with the induced distance. 

In the sequel we need the following classical selection theorem. For a proof we refer for instance 
to [3], Propositions 7.33 and 7.34, where a more general statement can also be found. 

Proposition 2.5. Let U be a metric space, V a metric separable space. For F : U x V set 

F*(u) = sup F(u,v), u £ U. 
vev 

1. If F £ USCb(U x V) and V is compact then F* £ USCb(U) and there exists a Borel- 
measurable <f> : U —> V such that 

F(u,<j>(u))=F*(u), u £ U. 

2. If F £ LSCb(U x V ) then F* £ LSCbfU ) and for every e > 0 there exists a Borel-measurable 

:[/—>■ V such that 

F(u,<j) e (u)) > F*(u) — e, u £ U. 

Next we present a well-posedness result and a verification theorem for the HJB equation in 
the space LSCb([0,T\ x E), Theorems 12.61 and 12.91 below. The use of lower semi-continuous 
bounded functions was already commented in the introduction and will be useful for the results 
in section [5] A small variation of our arguments also yields corresponding results in the class 
of upper semi-continuous functions, which are more natural when dealing with a maximization 
problem, see Theorems 12.71 and 12.101 that slightly generalize classical results. We first formulate 
the assumptions we need. 

A is a Feller transition kernel. (2.10) 

We recall that this means that for every £ Cb{E) the function (x,a) f E <f>(y) X(x,a,dy) is 
continuous (hence it belongs to Cb{E x A) by (12.4D ). 

Next we will assume either that 

/ £ LSC b ([ 0, T] x E x A), g £ LSC b (E), (2.11) 

or 

/ £ USCb([0,T] x E x A), g £ USCb(E) and A is a compact metric space. (2-12) 

Theorem 2.6. Under the assumptions (12.41) . (12.101) . (12.111) there exists a unique solution v £ 
LSCb([0,T] x E) to the HJB equation (in the sense of Definition PS 1\) . 
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Proof. We first make a change of unknown function setting v(t,x) = e~ At v(t,x), where A := 
sup xeE a <aA A (x,a,E) is finite by (12.41) . It is immediate to check that v is a solution to (I2.7I) - (I2.8D 
if and only if v is a solution to 

= sup [C%v(t, x ) + e~ At f(t , x, a) + A v(t, x)) 
aeA 

= sup ( / v(t, y) X(x, a, dy) + (A — A(x, a, E))v{t, x) + e~ At f(t, x, a) ) , (2.13) 

aeA \Je J 

= e~ AT g(x). (2.14) 

The notion of solution we adopt for (I2.13D - (I2.14I) is completely analogous to Definition 12.11 and 
need not be repeated. We set Ty{t,x) := f f T sup agj 4 7 c(s, x, a) ds where 

7 v(t,x,a) := / v(t,y) X(x, a, dy) + (A - A(x, a, E))v(t, x) + e~ At f(t, x,a) (2-15) 
Je 

and note that solving (I2.13I) - (I2.14D is equivalent to finding v £ LSCb([0,T] x E ) satisfying 

v(t,x) = g(x) + Ty(t,x), t € [0,T], x € E. 

We will prove that v y g + T{j is a well defined map of LSCb([0,T] x E) into itself and it has a 
unique fixed point, which is therefore the required solution. 

Fix v £ LSCb([ 0, T] x E). It follows easily from (12.4|) that jv is bounded and, if sup agj4 jv(-, •, a) 
is Borel-measurable, is bounded as well. Next we prove that 75 and r.g are lower semi- 
continuous. Note that (x, a) >->■ A — A(x, a, E) continuous and nonnegative (this is the reason why 
we introduced the equation for v), so 

( t , x, a) i->- (A — A(x, a, E))v(t, x) + e~ At f(t, x , o) 


dv 


v(T, x) 


is in LSCb([0,T] x E x A). Since A is Feller, it is known that the map 



v(t,y) A(x,a,dy) 


(2.16) 


is continuous when v £ Cfe([0, T] x E) (see [3], Proposition 7.30). For general v £ LSCb([ 0, T] x E), 
there exists a uniformly bounded and increasing sequence v n £ C&([0, T] x E) such that v n — > v 
pointwise (see [3], Lemma 7.14). From the Fatou Lemma we deduce that the map (12.161) is 
in LSCb([0, T] x E x A) and we conclude that 75 £ LSCb{[0,T] x E x A) as well. Therefore 
su P a eA 7c(-, •, a), which equals the right-hand side of (12.131) . is lower semi-continuous and hence 
Borel-measurable. To prove lower semi-continuity of suppose (t n ,x n ) -A (t,x); then 


Fv(t n ,x n ) - Tv(t,x) = / supy v(s,x n ,a)ds+ (sup 7 {;(s, x n , a) - sup 7 £;(s, x, a)) ds 

J tn cl£.A Jt 4 . cl£.i 4 . 

> ~\t - t n \ ||7c||oo + / (sup 7 i;(s, x n , a) — sup 7 ^( 5 , x, a)) ds. 

Jt aeA aeA 


By 


the Fatou Lemma 

lirri inf T'y(t, n . x n ) — T® 
n—>00 



liminf(sup 7 {i(s, x n , 
n ^°° a&A 


a) 


sup 7 c(s, x, a )) ds > 0 , 
aeA 


where in the last inequality we have used the lower semi-continuity of sup aga 4 7 c(-, - ,a). 
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Since we assume that g £ LSCb{E) we have thus checked that v i->- g+Ta maps LSCb([0, T]xE) 
into itself. To prove that it has a unique fixed point we note the easy estimate based on (I2.4|) . 
valid for every v'. v" £ LSCb([0,T] x E): 


sup 7 ff ( t,x,a ) — sup 77 / (t, x, a) 
a£A aeA 


< sup 1 75 / (t, x , a) - 77 / (t, x, a) \ 
a£A 


< 


sup { / \v'(t, y) — v"(t, y)\ A(x, a, dy) + \v'(t, x) — v"(t, x)| A(x, a, E) 

aeA \JE 


< 2A \\v' — u^Iiqo- 


By a standard technique one proves that a suitable iteration of the map u H^g+T^isa contraction 
with respect to the distance induced by the supremum norm, and hence that map has a unique 
fixed point. □ 


Theorem 2.7. Under the assumptions (|2.4j) . (|2.10|) . (I2.12P there exists a unique solution v £ 
USCb{[0,T] x E) to the HJB equation. 

Proof. The proof is almost the same as in the previous Theorem, replacing LSCb with USCb 
with obvious changes. We introduce v, 7*3 and To as before and we prove in particular that 
70 £ USCb([0,T] x E x A). The only difference is that we can not immediately conclude that 
sup aeA 7 o(-, -,a) is upper semi-continuous as well. However, at this point we can apply point 1 
of Proposition 12.51 choosing U = [0, T] x E, V = A and F = 70 and we deduce that in fact 
sup oGy 4 7 o(-, -,a) £ USC b ([0,T] x E). The rest of the proof is the same. □ 


Corollary 2.8. Under the assumptions (12. 41) . (12.101) . if f £ C^QO, T] x E x A), g £ C b {E) and 
A is a compact metric space then the solution v to the HJB equation belongs to C^QO, T] x E). 

The Corollary follows immediately from the two previous results. We proceed to a verification 
theorem for the HJB equation. 

Theorem 2.9. Under the assumptions (12.41) . (|2. 101) . (12.111) the unique solution v £ LSCb([0,T] x 
E) to the HJB equation coincides with the value function V. 

Proof. Let us fix (t,x) £ [0, T] x E. As in the proof of Proposition 12.11 we have the identity 

g(X T ) -v(t,X t ) = [ (r. X r ) dr + [ [ (v(r,y) - v(r, X r _)) p(dr dy), 

Jt or J (t, T ] JE 

which follows from the absolute continuity of t 1 —> v(t,x), taking into account that X is constant 
among jump times and using the definition of the random measure p. Given an arbitrary admis¬ 
sible control a £ A a d we take the expectation with respect to the corresponding probability Po 2 . 
Recalling that the compensator under P t,x is l[ i oo )(s)A(X s _, a(s,X s -), dy) ds we obtain 


^f\g(X T )]-v(t,X t ) 



+ / (v(r,y) - v(r,X r _)) A(W r _, a(r,X r _), dy) dr 

J(t,T] JE 



^-(r,X r ) + £ a /’ Xr) v(r,X r 


dr. 
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Adding Eq X J f T f(r, X r ,afr, X r )) dr to both sides and rearranging terms we obtain 

v(t,x) = Jft,x,a) -E^f ^ |^(r,X r ) + C°£ r ' Xr \{r, X r ) + ffr, X r ,a(r,X r ))\ dr. (2.17) 


Recalling the HJB equation and taking into account that X has piecewise constant trajectories 
we conclude that the term in curly brackets {...} is nonpositive and therefore we have v(t,x) > 
J(t,x,a) for every admissible control. 

Now we recall that in the proof of Theorem 12.61 we showed that the function 7 ^ defined in 
(12.151) belongs to LSCb([0,T] x E x A). Therefore the function 

F(t , x, a ) := e At 7 s(t, x, a) = C a E v(t, x) + fft, x, a) + A v(t, x) 


is also lower semi-continuous and bounded. Applying point 2 of Proposition ^. 51 with U = [0, T] x E 
and V = A we see that for every e > 0 there exists a Borel-measurable a e : [0 ,T] x E A such 
that Fft,x,a e ft,x)) > inf Q6j 4 Fft, x, a) — e for all t G [0, T], x G E. Taking into account the HJB 
equation we conclude that for every x G E we have 

C E e{t,x) vft, x) + fft, x, a e (t, x)) > -^(t, x) -e 

for almost all t G [0, T]. Noting that a e is an admissible control and choosing a = a e in (12.171) we 
obtain v(t,x) < J(t,x,a e ) + e(T — t). Since we know that v(t,x) > J(t,x,a ) for every a G A a d 
we conclude that v coincides with the value function V. □ 


Theorem 2.10. Under assumptions (12.41) . (12.101) . (12.121) the unique solution v G USCb([0,T]x E) 
to the HJB equation coincides with the value function V. Moreover there exists an optimal control 
a, which is given by any function satisfying 

C^’^vft, x) + f(t, x, aft, x)) = sup (C%v(t, x) + f(t , x, a)). (2-18) 

aeA 

Proof. We proceed as in the previous proof, but we can now apply point 2 of Proposition 12.51 to 
the function F and deduce that there exists a Borel-measurable a : [0, T] x E —>• A such that 
(12.181) holds. Any such control a is optimal: in fact we obtain for every x G E, 

£" (i ’ x) u(t, x) + f(t, x, aft, x)) = x) 

for almost all t G [0,T] and so v(t,x) = J(t,x,a). □ 

Remark 2.11. As already mentioned, Theorems 12.71 and 12. 101 are similar to classical results: com¬ 
pare for instance [26], Theorems 10, 12, 13, 14. In that paper the author solves the HJB equations 
by means of a general result on nonlinear semigroups of operators, and for this he requires some 
more functional-analytic structure, for instance he embeds the set of decision rules into a properly 
chosen topological vector space. He also has more stringent conditions of the kernel A, for instance 
X(x, a, B) should be strictly positive and continuous in (x, a) for each fixed B G S. 
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3 Control randomization and dual optimal control problem 

In this section we start to implement the control randomization method. In the first step, for 
any initial time t > 0 and starting point x £ E, we construct an (uncontrolled) Markovian pair 
of pure jump stochastic processes ( X , I) with values in E x A. by specifying its rate transition 
measure A as in (|3.3D below. Next we formulate an auxiliary optimal control problem where, 
rougly speaking, we optimize a cost functional by modifying the intensity of the process (X, I) 
over a suitable family. This “dual” control problem will be studied in section [I] by an approach 
based on BSDEs. In section [5] we will prove that the dual value function coincides with the one 
introduced in the previous section. 

3.1 A randomized control system 

Let E, A be Borel spaces with corresponding Borel cr-algebras £, A and let A be a measure 
transition kernel from (E x A, £ ® A) to ( E,£ ) as before. As another basic datum we suppose 
we are given a finite measure Ao on {A, A) with full topological support, i.e., it is strictly positive 
on any non-empty open subset of A. Note that since A is metric separable such a measure can 
always be constructed, for instance supported on a dense discrete subset of A. We still assume 
m, so we formulate the following assumption: 

(HA) Ao is a finite measure on ( A, A ) with full topological support and A satisfies 

sup X(x,a,E)<oo. (3-1) 

x£E,a£A 

We wish to construct a Markov process as in section 12.11 but with state space E x A. Ac¬ 
cordingly, let W denote the set of sequences lJ = (t n ,e n ,a n ) n >i contained in ((0, oo) x E x A) U 
{(oo, A, A')}, where A ^ E (respectively, A' ^ A) is adjoined to E (respectively, to A) as an iso¬ 
lated point, satisfying (12.2p In the sample space Q = E x A x Q' we define T n : II —>• (0, oo], 
E n : II —>• E U {A}, A n : II —>• A U {A'}, as follows: writing u = (e, a, u/) in the form 
a; = (e, a, t\, e\,t 2 ,e 2 , • • •) we set for t > 0 and for n > 1 

T n {u) = t n , T 0Q {u) = lim t n , T 0 (o;) = 0, 

71—»• OO 

E n (oj) = e n , A n { lo) = a n , E 0 (uj) = e, A 0 (uj) = a. 

We also define processes X : x [0, oo) —> E U {A}, I : x [0, oo) ->iU {A'} setting 

x t = l[0, T i]( i ) E 0 + ^2 l (T n ,T n+1 ](t) E n, It = l[0,Ti] (t) A) + ^ l (T n ,T n+1 ] (t) At, 
n >1 n >1 

for t < T^, Xt = A and If = A' for t > T^. 

In we introduce for all t > 0 the cr-algebras Qt = cr(N(s, B ) : s £ (0, t], B G £<S>A) generated 
by the counting processes N(s,B) = X^n>i 1 T n <sI(E n ,A n )eB an d the u-algebra Tt generated by 
T'o an d Qt, where To := £ ( 8 > A® {0,fL}. We still denote F = (Ft)t >o and V the corresponding 
filtration and predictable cr-algebra. By abuse of notation we also denote by the same symbol the 
trace of V on subsets of the form [0, T] x II or [t,T] x fl, for deterministic times 0 < t < T < oo. 
The random measure p is now defined on (0, oo) x E x A as 

p(ds dy db ) = E 1 {r n <oo} s {T n ,E n ,A n }(dsdydb). (3.2) 

n€ N 

By means of A and Ao satisfying assumption (HA) we define a (time-independent) rate tran¬ 
sition measure on E x A given by 

A (x,a;dydb) = X(x,a,dy) 5 a {db) + Xo(db) 5 x {dy). (3-3) 
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and the corresponding generator C\ 


£(p(x,a) := / {<p{y, b) — ip(x, a)) A(x, a; dy db) 

JEx A 

= / - <p(x,a))\(x,a,dy) + / (p(x, b) - <p(x, a)) X 0 (db), 

JE J A 


for all (x,a) € E x A and every function ip € Bb(E x A). 

Given any starting time t > 0 and starting point (x, a) £ E x A, an application of Proposition 
12.11 provides a probability measure on (r^Jy*,), denoted by P* ,x ’“, such that (X, I) is a Markov 
process on the time interval [t, oo) with respect to IF with transition probabilities associated to C. 
Moreover, P tjX > a -a.s., X s = x and I s = a for all s € [0, £]. Finally, the restriction of the measure p 
to (t, oo) x E x A admits as F-compensator under f >t ’ x ’ a the random measure 


p(ds dy db) := \ 0 (db) 5{ Xs _}{dy) ds + A(X S _, / s _, dy) 5{ Is _y(db) ds. 

We denote q := p — p the compensated martingale measure associated to p. 

Remark 3.1. Note that A(x,a;{x,a}) = Ao({a}) + \(x,a, {x}). So even if we assumed that 
X(x, a , {x}) = 0, in general the rate measure A would not satisfy the corresponding condition 
A(x, o; {x, a}) = 0. We remark that imposing the additional requirement that Ao({a}) = 0 is too 
restrictive since, due to the assumption that Ao has full support, it would rule out the important 
case when the space of control actions A is finite or countable. 


3.2 The dual optimal control problem 

We introduce a dual control problem associated to the process (X, I) and formulated in a weak 
form. For fixed (£, x, a), it consists in defining a family of probability measures {Pt’ 1 ’ 0 , v G V} in 
the space (f2, JFoo), all absolutely continuous with respect to P i,x,a , whose effect is to change the 
stochastic intensity of the process ( X,I ) (more precisely, under each p( / ,i,a the compensator of 
the associated point process takes a desired form), with the aim of maximizing a cost depending 
on f,g. We note that {Pt ,3: ’ a , v G V} is a dominated family of probability measures. We proceed 
with precise definitions. 

We still assume that (HA) holds. Let us define 

V = {y : n x [ 0 , oo) x A —> ( 0 , oo), V A- measurable and bounded}. 

For every v € V, we consider the predictable random measure 

p v (dsdydb) := u s (b) X 0 (db) 5{x 3 ^}{dy) ds + A(X S _, / s _, dy) 5 {Ia _}(db) ds. (3.4) 

Now we fix t € [0, T], x € E, a £ A and, with the help of a theorem of Girsanov type, we will 
show how to construct a probability measure on (fi, J 7 ^), equivalent to P*’ x ’“ ) under which p u is 
the compensator of the measure p on (0, T] x E x A. By the Radon-Nikodym theorem one can 
find two nonnegative functions d±, d -2 defined onOx [0, oo) x E x A, measurable with respect to 
V ® £ (g> A such that 


A o{db) 5 {Xt _}(dy) dt = 
A (X t _, I t _, dy)6{ It _y(db)dt = 
d\{t,y,b) + d 2 (t,y,b) = 
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d\ (t, y , b) p(dt dy db) 
d 2 {t,y,b)p(dtdy db), 

1 , p(dt dy db) — a.e. 


and we have dp u = {y d\ + d^dp. For any v G V, consider then the Doleans-Dade exponential 
local martingale L v defined setting L v s = 1 for s G [0, t] and 


L u s = exp 



ExA 


\og{v r (b)di{r,y,b) + d 2 (r,y,b)) p(dr dy db) - a (u r (b) — l)Xo(db) dr 


= e f t s f A a-Mb))x 0 (db)dr jj ( v Tri {A n )d 1 {T n ,E n ,A n ) + d 2 {T n ,E n ,A n )) 


n'^V.T n ^.s 


for s G [t, T\, where q = p — p. When L v is a true martingale, i.e., E t,x,a [L E \ = 1, we can define 
a probability measure T’ t I ; x,a equivalent to F t,x,a on (0, .Too) setting P \; x,a {duj) = Llf(u;)F t ’ x ’ a (dco). 
By the Girsanov theorem for point processes am, Theorem 4.5) the restriction of the random 
measure p to (0, T] x E x A admits p v = (vd\ + d 2 )p as compensator under Pt’ x,a . We denote 
by ¥}y X,a the expectation operator under Py ,a:,a and by q v := p — p u the compensated martingale 
measure of p under Pt" i,a . The validity of the condition K t,x,a [L^\ = 1 under our assumptions, as 
well as other useful properties, are proved in the following proposition. 


Lemma 3.2. Let assumption (HA) hold. Then, for every t G [0, T], x G E and u G V, under the 
probability p 4 ’ 31 ’ 0 the process L v is a martingale on [0, T ] and Lif is square integrable. 

In addition, for every V <g> £ <S> A-measurable function H : x [t,T] x E x A -A R such that 




ft f ExA \H s {y,b)\-p(dsdydb) < oo, the process f t f ExA H s (y,b) q u (ds dy db) is a 


7,t,x,a 

V 


martingale on [ t,T ]. 


Proof. The first part of the proof is inspired by Lemma 4.1 in [21] . In particular, since v is 
bounded and Ao(Al) < oo, we see that 

S!f = ex.p^Jt j \v s (b) — 1\ 2 \o(db) ds'j 

is bounded. Therefore, from Theorem 8 , see also Theorem 9, in E3, follows the martingale 
property of L u together with its uniform integrability. Concerning the square integrability of L!f, 
set £(x, A) := 2 ln(xA + 1 — A) — ln(x 2 A + 1 — A), for any x > 0 and A G [0,1]. From the definition 
of L u we have (recalling that d 2 (s,y,b) = 1 — d\(s,y,b)) 

\Lf\ 2 = L’f Slf exp ( f f £(y s {b),di{s,y,b))p(dsdydb) \ < Lf Sf, 

\Jt J ExA J 


where the last inequality follows from the fact that t is nonpositive. This entails that Ltf is square 
integrable. 


Let us finally fix a predictable function H such that E t,x,a f f T f E 4 \H S (y, b) | 2 p{ds dy db) 


< 


oo. The process f t f E A H s (y , b ) q u (ds dy db) is a Pt ,a: ’ a -local martingale, and the uniform integra¬ 
bility follows from the Burkholder-Davis-Gundy and Cauchy Schwarz inequalities, together with 
the square integrability of Lif. □ 


To complete the formulation of the dual optimal control problem we specify the conditions 
that we will assume for the cost functions /, g: 

(Hfg) / G 5 6 ([0, T] x E x A) and g G B b (E). 

For every t G [0, T], x G E, a G A and v G V we finally introduce the dual gain functional 

r rT i 


J(t , x , a, v) = E 


t,x,a 

v 


g(X T ) + 


f{s, X s , I s )ds , 
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and the dual value function 


(3.5) 


v*(t, x, a) = sup J(t, x, a, v). 
u&V 

Remark 3.3. An interpretation of the dual optimal control problem can be given as follows. 
It can be proved that, under Pt’ x ’ a , the jump times of / and X, denoted by {S n } and {R n } 
respectively, are disjoint. The compensators of the corresponding random measures /a 1 (ds db) = 
En S (S n ,i Sn ) (ds db) on (0,oo) x A and / i x (dsdy) = E n \ Rn ,x Rn ) (ds dy) on (0,oo) x E are 

p/ (ds db) = v s (b) Xo(db) ds, p x (dsdy) = \(X S _, I s _, dy) ds. 

Thus, the effect of choosing v is to change the intensity of the /-component. We leave the proofs 
of these facts to the reader since they will not be used in the sequel. 


4 Constrained BSDE and representation of the dual value func¬ 
tion 

In this section we introduce a BSDE, with a sign constrain on its martingale part, and prove 
existence and uniqueness of a minimal solution, in an appropriate sense. The BSDE is then used 
to give a representation formula for the dual value function introduced above. 

Throughout this section we assume that the assumptions (HA) and (Hfg) are satisfied and 
we use the randomized control setting introduced above: O, IF, X, f >t ’ x ’ a as well as the random 
measures p,p,q are the same as in in subsection 13.11 For any ( t,x,a ) G [0, T] x E x A, we 
introduce the following notation. 

• L 2 (Ao), the set of A-measurable maps -0 : A —>• M such that 

Ml 2 ( a o) := J A l ^( b )! 2 X °( db ) < °°- 


• L 2 j x,a(-^v)) the set of Tv-measurable random variable X such that E i,3: ’ a l(A| 2 ] < oo; here 
r is an P-stopping time with values in [t. T], 

• ®t,x,a the set of real valued cadlag adapted processes Y = (Yg)^^ such that 


i mi 


2 

S 2 


t,x,a 


. u ft,x,a 


sup |W| 2 
.t^s^T 


< OO. 


L 2 xa (q)) the set of V <g> 8 <g> A-measurable maps /:Hx[t,T]xl?xi->I such that 


mu 


LL,a(?) 


= E 


t.X 


^ [ \Z s (y,b)\ 2 p(dsdydb) 

t JExA m 

a [ f \Z s (I s ,y)\ 2 \(X s J s ,dy)ds + 

Jt Je 


t JA 


\Zg(Xg,b)\ 2 \ 0 (db) ds 


< oo. 


K 2 x a the set of nondecreasing predictable processes K = (K s )t^ s ^T & S 2 x a with K t = 0, 


with the induced norm 


\\K\\ 2 2 =E t ' x ’ a [\K T \ 2 } 
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We are interested in studying the following family of BSDEs parametrized by ( t,x,a ): P t, 3 : ’ a -a.s., 

Y^ a = g(X T )+ [ T f(r,X r ,I r )dr + K¥’ a -Kl’*’ a (4.1) 

J S 

~ [ T [ Z^iy, b) q(drdydb) -ft Z^ a (X r , b ) A 0 (db) dr, s E [t,T\, 

J s «/ E xA J s J A 


with the sign constraint 


Zl’ x ’ a {X s _,b) < 0, X 0 (db) -a.e. on [t,T] x fi x A. (4.2) 

This constraint can be seen as a sign condition imposed on the jumps of the corresponding 
stochastic integral. 

Definition 4.1. A solution to the equation (14. II) - (14.21) is a triple ( Y,Z,K ) E Sj! xa x L{? a (q) x 
Kt ;X , a that satisfies (14. 1 [) - (14.2D . 

A solution ( Y,Z,K ) is called minimal if for any other solution ( Y,Z,K ) we have, P t,x,a -a.s., 

Y s Y s , s E [t, T], 


Proposition 4.1. Under assumptions (HA) and (Hfg), for any (f, x, a) E [0, T] x E x A, if 
there exists a minimal solution on (fl, J 7 , F, P t,a; ’ a ) to the BSDE (14.1 f) - (14.21) . then it is unique. 


Proof. Let ( Y,Z,K ) and (Y 7 , Z', K') be two minimal solutions of (14.1[) - (I4.2D . The component Y 
is unique by definition, and the difference between the two backward equations gives: P t,a: ’ a -a.s. 


(Z r (y, b ) - Z'fiy, b))p(drdydb) 

(■ Z r (y, I r ~) - Z' r (y, I r _))\(X r _,I r _dy)dr, Vt^s^T. 


It JExA 

= K s - K' + 



The right hand is a predictable process, in particular it has no totally inaccessible jumps (see, 
e.g., Proposition 2.24, Chapter I, in [12])) while the left side is a pure jump process with totally 
inaccessible jumps, unless Z = Z'. This implies the uniqueness of the component Z, and as a 
consequence the component K is unique as well. □ 


We now state the main result of the section. 

Theorem 4.2. Under the assumptions (HA) and (Hfg), for all (t, x, a) E [0, T] x E x A there 
exists a unique minimal solution Y t,x,a to (I4.1D - (I4.2D . Moreover, for all s E [t, T], Yf’ x,a has the 
explicit representation: 

Y*’ x ’ a = esssupE^’ 0 

v&V 

In particidar, setting s = t, we have the following representation formula for the value function 
of the dual control problem: 

v*(t, x, a) = Yf' x ' a , (■ t, x, a) € [0, T] x E x A. (4.4) 

The rest of this section is devoted to prove Theorem 14.21 To this end we will use a penalization 
approach presented in the following subsections. Here we only note that for the solvability of the 
BSDE the use of the filtration F introduced above is essential, since it involves application of 
martingale representation theorems for multivariate point processes (see e.g. Theorem 5.4 in 

[E|). 


g(X T )+ f (r,X r ,I r ) dr 


X, 




(4.3) 
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4.1 Penalized BSDE and associated dual control problem 

Let us consider the family of penalized BSDEs associated to (SHD-(HI2D, parametrized by the 
integer n ^ 1 : 


yTL,t,X,CL 
* S 


g(X T ) + f f(r, X r , I r ) dr + K^ a - K^ a (4.5) 

- [ T [ Z™’ t,x,a (y, b) q(dr dy db ) - P [ Z^’ x ’ a (X r , b) X 0 (db) dr, s G [t, T], 

J s </ E x A J s J A 


where I\ n is the nondecreasing process in K 2 X a defined by 


K n s =n / [Z?(X r , b)] + A 0 (db) dr. 

Jt Ja 

Here we denote by [u] + the positive part of u. The penalized BSDE (|4.5I) can be rewritten in the 
equivalent form: P t,a; ’ a - a .s., 

yn^a = g{XT) + f T r(r; Xr> /r> Z n,t,*,a (Xrj .)) ds 
J s 

Z™ ,t,x ’ a (y,b) q(dr dy db), s € [t,T]. 


f T ) 

Is JExA 


where the generator f n is defined by 


f n {t,x,a,^) := f(t,x,a) + f {n[^(b)] + - ip(b)} X 0 (db), (4.6) 

Ja 

for all ( t,x,a ) in [0, T] x E x A, and if; 6 L 2 (Ao). We note that under (HA) and (Hfg) f n is 
Lipschitz continuous in i/j with respect to the norm of L 2 (Ao), uniformly in ( t,x,a ), i.e., for every 
n G N there exists a constant L n depending only on n such that for every (t, x, a) £ [0, T] x E x A 
and i/j, G L 2 (Aq), 


\f n (t, x, a, ip') - f n (t,x,a,'i/>)\ ^ L n \ip - V’ / |l 3 (A 0 )- 

The use of the natural filtration F allows to use well known integral representation results for F- 
martingales (see, e.g., Theorem 5.4 in [IS]) and we have the following proposition, whose proof is 
standard and is therefore omitted (similar proofs can be found in [29] Theorem 3.2, [2] Proposition 
3.2, [ 8 ] Theorem 3.4). 

Proposition 4.3. Let assumptions (HA) and (Hfg) hold. For every initial condition (t, x, a) G 
[0, T] x E x A, and for every n G N, there exists a unique solution (Y™' t ’ x,a , Zs’ t,x ’ a ) se ^j^ G 
®t,x,a x L 2 x a ( c l) satisfying the penalized BSDE (14.51) . 

Next we show that the solution to the penalized BSDE (14.51) provides an explicit representation 
of the value function of a corresponding dual control problem depending on n. This is the content 
of Lemma 14.41 which will allow to deduce some estimates uniform with respect to n. 

For every n ^ 1, let V n denote the subset of elements v G V that take values in (0, n\. 

Lemma 4.4. Let assumptions (HA) and (Hfg) hold. For all n > 1 and s G [t, T], 


Y c 


n,t,x,a _ 


= esssup E 

i'GV 


t.x.a 


g(X T )+ f f(r,X r ,I r ) dr 

J S 


Xs 


f)t,x,a 


— a.s. 


(4.7) 
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Proof. We fix n > 1 and for any v £ V n we introduce the compensated martingale measure 
q u (ds dy db ) = q{ds dy db ) — {v s {b) — 1) d\ (s, y, b) p(ds dy db ) under Fv X ' a . We see that the solution 
( Y n ,Z n ) to the BSDE (|4.5|> satisfies: P t,a: ’ a -a.s., 


Yf 


g(X T ) + / f(r,X ri I r )dr+ / 

J S J S 

- [ [ Z?(y,b)q u (drdydb), 

Js JExA 


f {n[Z?(X r ,b)} + 

J A 

s £ [t,T]. 


- v r (b) Zf(X r ,b)} A 0 (db) dr 

(4.8) 


By taking conditional expectation in (14.81) under Pt’ x,a and applying Lemma 13.21 we get, for any 
s G [tyT}, 


Y"riit,x,a _ n ^t,x,CL 


g(X T )+f f(r,X r ,I r ) dr 

J S 


F* 


(4.9) 


+ 


r 

/ {n[Zf’ t,x,a (X r , b)] + - v r (b) Z^’ x ’ a (X r , b )} A 0 (db) dr 
J A 




jft,X,CL 

V 


a.s. From the elementary numerical inequality: n[tt] + — vu ^ 0 for all u £ M, v £ (0,n], we 


deduce by (14.91) that 


yn,t,x,a ^ esssup E*> x > a 
i/eV n 


g(X T ) + £ f(v,X r ,I r ) 


dr 


F, 


On the other hand, for e £ (0, 1), let us consider the process v e £ V n defined by 
Vl(b) = nt^ Z n,t, X ,a^ Xs _ b ^ o} + el {-l<Z"'*’ X ’ a (JSf s _,6)<0} ~ 6 Z™’ t,X,a (X s _,b) 1 l^n.t.a.c 
By construction, we have 

n[Z^ x ’“(X s _,6)]+ - F s (b)Z^’ x ’ a (X s _,b) < e, s £ [t,T], 6 £ 4, 


(4.10) 


(x s _, 


and thus for the choice of z/ = zA in (14.91) : 


71,t,X,CL 
* S 


< E' 


t,x,a 


g{X T ) + / f(r, X r , I r ) dr 


F, 


+ eT\X 0 (A)\ 


^ esssup E' 


^gV" 


t,x,a 


g(X T )+ f(r, X r ,I r ) dr 


F s 


+ ex’! Aq (^4) | ■ 


Together with (14.101) . this is enough to prove the required representation of Y n . Note that we 
could not take v s (b) = nt{z n (x„-,b)^o}i since this process does not belong to V n because of the 
requirement of strict positivity. □ 


4.2 Limit behavior of the penalized BSDEs and conclusion of the proof of 
Theorem 14.21 

As a consequence of the representation (14.71) we immediately obtain the following estimates: 

Lemma 4.5. Let assumptions (HA) and (Hfg) hold. There exists a constant C, depending only 
on T,f,g, such that for any ( t,x,a ) € [0, T] x E x A and n> l, P t,x ’ a -a.s., 


-yn,t,x,a ^ -yn-\-\,t,x,a 
1 s ^ 1 s ? 


| y n,t,z,a| ^ (j 


s £ [t,T]. 
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Proof. For fixed s £ [t. T], the almost sure nronotonicity of Y n,t,x,a follows from the representation 
formula (|4.7D . since by definition V” C V n+1 ; moreover, the same formula shows that we can take 
C = H^Hoo + TH/||oo. Finally, these inequalities hold for every s £ [t,T] outside a null set, since 
the processes Y n,t,x,a are cadlag. □ 


Moreover, the following a priori uniform estimate on the sequence (Y n,t,x,a , z n ' t ’ x,a , K n,t,x,a ) 
holds: 

Lemma 4.6. Let assumptions (HA) and (Hfg) hold. For all (t, x, a) € [0, T] x E x A and n £ N, 
there exists a positive constant C' depending only on T, /, g such that 


| Y 


n,t,x,a\ |2 


IS 2 

s t,x,a 


+\\z 


n,t,x,a 1 |2 


- J t,x,a 


(q) 


+ Mir 


n,t,x,a 112 

I Ik? 


^ C'. 


(4.11) 


Proof. In the following we omit for simplicity of notation the dependence on (t, x, a) for the triple 
( Y n ’ t ’ x ’ a , Z n,t,x,a , K n,t,x,a ). qq ie estimate on Y n follows immediately from the previous lemma: 


\\Y 


n 112 _j ^t,x,a 

^t,x,a 


sup |Y1 

se[t,T\ 


n |2 


< c 2 . 


(4.12) 


Next we notice that, since K n is continuous, the jumps of Y n are given by the formula 

Zs(y,b)p({s},dy db). 


AT S " = 


<ExA 


The Ito formula applied to \Y t n \ 2 gives: 

d|T r n | 2 = 2Y r n _dY r n + \AY r n \ 2 


= -2 Y r n _ f{X r _, I r _) dr - 2 Y r n _ dK 


+2 Y r n _ / Z)f{y,b)q{drdydb) + 2Y r r f / Z?{X r _,b) \ 0 (db) dr 
Jexa Ja 

+ [ \Z?(y,b)\ 2 p({r}dydb). 

JExA 


(4.13) 


Integrating (14.131) on [s, T], for every s £ [ t,T ], and recalling the elementary inequality 2ab ^ 
jjd 2 + 5b 2 for any constant 6 > 0, and that 


E 


t.x.a 


f T [ |Z r n (X r _, b)\ 2 X 0 (db)dr] ^ E t,x,a \ f j I Z?(y, b)\ 2 p{dr dy db) 

.«/ s J A _ jj s J E x A 


(4.14) 


we have: 


E i,x,a [jy^ |2j +E t,x,a 

[\g(X T )\ 


s J ExA 
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\Z?(y, b )\ 2 p{dr dy db) 


P 


J \f{r,X r ,I r )\ 2 dr 


r rT 


+ /3E 


t,x,a 


\Y r n \ 2 dr 


+ 


'£^oA)_ K t,x,a 


7 


Is J ExA 


\ z r (y,b)\ 2 p(dr dy db) 


+ qE 1 


t,x,a 


j : 


m 2 dr 


4—E 
a 


t,x,a 


sup | Y s 

sE[£,T] 


n 1 2 


+ aE t,x,a [\K% - A7| 2 ] , s £ [t, T], 


(4.15) 
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for some a, (5 ,7 > 0, Now, from the equation (14.51) we obtain: 


- K n s = Y s n - 


g( x T ) - j f(r, x r , I r )dr 

r Z?{X r ,b)\ 0 (db)dr 

[ [ Z?(y,b)q(drdydb ), s<E[t,T], 

J s J ExA 


' s J A 
r T 


Next we note the equality 

rT r 


•J ^t,X,CL 


's J ExA 


Z?(y, b) q(dr dy db) 


= E' 


t.x.a 


= E' 


t.x.a 


[ [ \ z r(y, b)\ 2 p(dr dy db) 

is J ExA 

[ I \ Z r(V: b)\ 2 p(dr dy db) 
is JExA 


that can be proved applying the Ito formula as before to the square of the martingale u 1 —>• 
f™ IexA Z r(v> b)q(drdydb), u £ [s,T] (or by considering its quadratic variation). Recalling 
again (14.141) we see that there exists some positive constant B such that 


E* ,x ’° [\Kj, — Kg\ 2 ] < [|y”| 2 ] +E t ’ :r ’ a [\g(X T )\ 2 ] + E 4 ’ x ’ a \f(r,X r ,I r )\ 2 dr 

\ z r (y,b)\ 2 p(drdydb) Y se[t,T]. 


+E 


t.x.a 


n 

Is JExA 


(4.16) 


Plugging (14.161) into ()4. 151) . and recalling the uniform estimation (14.121) on Y n , we get 


(l - aB)E t ’ x ’ a [\Y S \ 2 ] + (l- 


aB + 


TXo(A) 

7 


H^i t,x,a 


is J ExA 


\ Z r(Vi b)\ 2 p(dr dy db) 


^ (1 + aB) E t,a: ’“ [\g(X T ) | 2 ] + (aB + ^ £ \f(r,X r ,I r )\ 2 dr 


+— + (7 + P) E^’ a [ f T \y r n \ 2 dr 
a .L 


, s £ [t,T]. 


Hence, by choosing a £ (0, -g), 7 > T 1 X °^ , (3 > 0, and applying Gromwall’s lemma to s —> 
E t,x ’“ [|y s n | 2 ], we obtain: 

sup E t,x,a [|y s | 2 ] -\-E t ' x,a [ [ \Z™(y, b)\ 2 p(ds dy db) 

se[t,T] Ut JexA 

< c' (&' x ’ a [\g(X T )\ 2 ] + E^’° \f(s,X s ,I s )\ 2 ds + C 2 ) , (4.17) 

for some C' > 0 depending only on T, which gives the required uniform estimate for ( Z n ) and 
also ( K n ) by (14.161) . □ 


We can finally present the conclusion of the proof of Theorem 14.21 
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Proof. Let (t,x, a) € [0, T] x E x A. We first show that ( Y n , Z n , K n ) (we omit the dependence on 
(t,x, a) for simplicity of notation) solution to (14.51) converges in a suitable way to some process 
( Y,Z,I \) solution to the constrained BSDE (14.1 h - (14.21) . By Lemma 14.51 ( Y n ) n converges increas¬ 


ingly to some adapted process Y, which moreover satisfies E t,x,a 


SU P se[t,T] l y s 


< oo by the 


uniform estimate for ( Y n ) n in Lemma 14.61 and Fatou’s lemma. Furthermore, by the dominated 
convergence theorem, we also have E J' (j | Y t n — Yt\ 2 dt —> 0. Next, we prove that there exists 
{Z, K) £ L^ x a (q) x K 2 X a with K predictable, such that 

(i) Z is the weak limit of ( Z n ) n in L 2 x>a (q); 

(ii) K t is the weak limit of (Kf) n in L 2 X a (Jv), for any stopping time r valued in [t T]; 


(iii) F 


i t.x.a 


a.s., 


y s = 


g{X T ) + f f (r, X r , I r ) dr + 
J s 


K t - K s 


Is J ExA 


Z r (y, b)q(drdydb) — 


Z r (X r , b) Xo(db) dr, s € [t, T], 


with 


r l . 

T * 

Z v- A 

r 2 . 

T ' 

Z e A 


Z s (X s ^,b) < 0, ds <S> dP t,x,a <S> Ao (db) — a.e. 

Let define the following mappings from L^ x a (q) to Lt x a (-7v): 

/ / Z s (y, b) q(ds dy db), 

It J ExA 

r [ z s (x s ,b)\ 0 (db)ds, 

Jt Ja 

for each F-stopping time t with values in [ t,T ]. We wish to prove that I^Z n and lfZ n converge 
weakly in L^ x a (Jv) to I^Z and I 2 Z respectively. Indeed, by the uniform estimates for ( Z n ) n 
in Lemma 14.61 there exists a subsequence, denoted ( Z nk )k , which converges weakly in L^ Xia (q). 
Since I\ and I 2 are linear continuous operators they are also weakly continuous so that we have 
l} r Z nk -a I^Z and lfZ nk -A I 2 Z weakly in L^ X)a (Jv) as k —> 00 . Since we have from (14.51) 


K? k = -Y? k + Y, nk - 


+ 


f(r,X r ,I r ) dr 
J t 

Zf k (X r , b) A 0 (db) dr + [ [ Z? k {y,b) q{dr dy db), 
Jt J ExA 


we also obtain the weak convergence in Lt x a (-7>) as k —> 00 


K k ->• K t := -Y t + Y t - 


[fAXr^ 


dr 


+ 


Z r (X r ,b ) Aq (db) dr + 


It J ExA 


Z r (y,b)q(drdydb). (4.18) 


Arguing as in |24| proof of Theorem 2.1, or m Lemma 3.5, fl5] Theorem 3.1 we see that K 
inherits from K nk the properties of having nondecreasing paths and of being square integrable 
and predictable. Finally, from Lemma 2.2 in [23] it follows that K and Y are cadlag, so that 
K t,x,a £ K? )X _ a and Y^ a £ S t 2 xa . 
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Notice that the processes Z and K in (|4.18l) are uniquely determined. Indeed, if (Z, K ) and 
(. Z',K') satisfy (14.181) . then the predictable processes Z and Z' coincide at the jump times and 
can be identified almost surely with respect to p(u, ds dy db)F t ' x,a (duj) (a similar argument can be 
found in the proof of Proposition 14.11 to which we refer for more details). Finally, recalling that 
the jumps of p are totally inaccessible, we also obtain the uniqueness of the component I\. The 
uniqueness of Z and K entails that all the sequences ( Z n ) n and ( K n ) n respectively converge (in 
the sense of points (i) and (ii) above) to Z and K. 

It remains to show that the jump constraint (14.21) is satisfied. To this end, we consider the 
functional on Lj? x (q) given by 


G : 


Z i—>• E t ’ x ’ a 



[Z s (X s _,b)} + X 0 (db) ds 


From uniform estimate (14.111) . we see that G(Z n ) —> 0 as n —> oo. Since G is convex and 
strongly continuous in the strong topology of (q), then G is lower semicontinuous in the 

weak topology of Ljf x a (q), see, e.g., Corollary 3.9 in [5]. Therefore, we find 


G{Z) ^ liminf G(Z n ) = 0, 

n—>-oo 


from which follows the validity of the jump constraint (14.21) on [t, T], We have then showed that 
( Y,Z,K ) is a solution to the constrained BSDE (|4.1D - (I4.2I) . It remains to prove that this is the 
minimal solution. To this end, fix n £ N and consider a triple (Y. Z, I\) £ S^ x a xL^ x a (q) xK* 
satisfying (14.ll) - ()4.2j) . For any v £ V n , by introducing the compensated martingale measure q u , 
we see that the solution ( Y,Z,K ) satisfies: P*> x J°-a.s., 


Y s = g(X T ) + / / (r, X r , I r ) dr + K T — K s 


(4.19) 


Is J ExA 


Z r (y, b) q u (dr dy db) — 


v r (b) Z r (X r , b) Aq (db) dr s £ [ t,T ]. 


By taking the expectation under ip(;' 1 ’ a in (14.191) . recalling Lemma 13.21 and that K is nondecreasing, 
we have 


Y 8 



g(X T ) + J f (r,X r ,I r ) dr 

T U>t,X,CL 

f f v r (b) Z r (X r , b) Xo(db) dr 

Js Ja 



g(X T ) + J f(r,X r ,I r ) dr 

s £ [t, T], 

(4.20) 


since v is valued in (0, n] and Z satisfies constraint ( 14 . 21 ) . As v is arbitrary in V n , we get from the 
representation formula ( 14 . 71 ) that Y s ^ Y™, Vs £ [t,T], Vn G N. In particular, Y s = lim rii _;. 00 YJ 1 ^ 
Y s , i.e., the minimality property holds. The uniqueness of the minimal solution straightly follows 
from Proposition 14.11 

To conclude the proof, we argue on the limiting behavior of the dual representation for Y n 
when n goes to infinity. Since V n C V, it is clear from the representation ( 14 . 71 ) that, for all n 


and s £ [t. T], Y s n V esssup^gy Et ,,r ’ a 


g(X T ) + f s T f(r, X r ,I r ) dr 



Moreover, being Y the 


pointwise limit of Y n , we deduce that 


= lirn YP < esssupEt’ x ’ a 

g(X T )+ [ T f(r,X r ,I r ) dr 

Ts 

n^oo u£V 

^ s 



(4.21) 
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On the other hand, for any v £ V, introducing the compensated martingale measure q u under P" 
as usual, we see that (Y. Z, K ) satisfies 


Y, = 


9(X T )+ [ f{r,X r ,I r )dr + K T -K s (4.22) 

J S 

r T r 

Z r {X r , b) v r (b)\o(db) dr, s £ [t,T]. 


-[ [ Z r (y, b)q v (drdydb) 

Js J ExA 

Arguing in the same way as in (14.201) . we obtain 


s J A 


Y > E ?!*>“ 


g(X T )+ f(r,X r ,I r ) dr 




so that Y s ^ esssup^gy ¥^ x,a g(X T ) + fj f(r, X r ,I r ) dr 
gether with (I4.2ip this gives the required equality. 




by the arbitrariness of v G V. To- 


□ 


5 A BSDE representation for the value function 

In this section we conclude the last step in the method of control randomization and we show that 
the minimal solution to the constrained BSDE (I4.1I) - (I4.2I) actually provides a non-linear Feynman- 
Kac representation of the solution to the Hamilton-Jacobi-Bellman (HJB) equation (12.711 - 02.81) . 
that we re-write here: 

dv 

x) = sup (£%v(t, x) + f{t, x, a )), v(T, x) = g(x). 

Ot a£A 

As a consequence of the dual representation in Theorem 14.21 it follows that the value function of 
the original optimal control problem can be identified with the dual one, which in particular turns 
out to be indepedent on the variable a. 

For our result we need the following conditions: 


sup \(x, a , E) < oo, 

(5.1) 

x£E,aeA 


A is a Feller transition kernel, 

(5.2) 

f£C b ([0,T] xEx A), g£C b (E). 

(5.3) 


We note that these assumptions are stronger that those required in Theorem 12.61 and therefore 
they imply that there exists a unique solution v £ LSCb([0,T] x E) to the HJB equation in the 
sense of Definition I2T1 If, in addition, A is a compact metric space then v £ C&([0,T] x E) by 
Corollary 12.81 

Let us consider again the Markov process (A, I) in ExA constructed in section 13.11 with 
corresponding family of probability measures f >t ’ x ’ a and generator C introduced in (13.41) . Since 
(15.lD - fl5.3j) are also stronger than (HA) and (Hfg), by Theorem 14.21 there exists a unique solution 
to the BSDE (I4.1I) - (I4.2I) . 

Our main result is as follows: 

Theorem 5.1. Assume (|5.1j) . (15.21) . (15.31) . Let v be the unique solution to the Hamilton-Jacobi- 
Bellman equation provided by Theorem \2.6l Then for every ( t,x,a ) £ [0, T] x E x A, 

v(t,x) = Yt’ x ’ a , 
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where Y t,x,a is the first component of the minimal solution to the constrained BSDE with nonpo¬ 
sitive jumps 

More generally, we have P t,x,a -a.s., 

v(s,X s ) = Y s t ’ x ’ a , s £ [t,T]. 

Finally, for the value function V of the optimal control problem defined in (HID and the dual value 
function v* defined in (1331) we have the equalities 

V(t, x ) = v(t, x ) = Y* ,x,a = v*(t, x, a). 

In particular, the latter functions do not depend on a. 

The rest of this section is devoted to prove Theorem 15.11 


5.1 A penalized HJB equation 

Let us recall the penalized BSDE associated to (I4.1D - Q4.2I) : P t,a; ’ a -a.s., 


Y n,t, X ,a = g (X T )+ [ T f(r,X r ,I r )ds- f [ 

J s J s JExA 


7n.t.x.a 


(y, b) q{dr dy db ) 


(5.4) 


+ 


' s J A 


{n [Z™’ t,x,a (X r , 6 )]+ - Z^’ x ’ a (X r , b )} A 0 {db) dr, s £ [t, T). 


Let us now consider the parabolic semi-linear penalized integro-differential equation, of HJB type: 
for any n > 1, 


+ f {n[v n (t,x,b) 
J A 


x, a) + Cv n (t, x, a) + f(t, x, a) 

v n (t, x, a)] + — [v n {t, x, b ) — v n {t, x, a))} Ao (db) 

v n (T, x, a) 


(5.5) 

0 on [0, T) x E x A, 
g(x) on E x A, (5.6) 


The following lemma states that the solution of (j 5.5 j) - (| 5.G [ ) can be represented probabilistically 
by means of the solution to the penalized BSDE (15.41) : 

Lemma 5.2. Assume () 5.1 j) . (15.21) . f)5.3l) . Then there exists a unique function v n £ Cb([0,T] x 
E x A) such that t i->- v n (t,x,a) is continuously differentiable on [0,3] and (15.51) - (I5.6D hold for 
every ( t,x,a ) £ [0, T) x E x A. 

Moreover, for every ( t, x, a) £ [0, T] x E x A and for every n £ N, 

Y n,t, X ,a = ( 5 . 7 ) 

Zg’ t,x ’ a (y,b) = v n (s,y,b)-v n (s,X s _,I s _), (5.8) 

(to be understood as an equality between elements of the space Sj) jXja x L^ x>a (q)j so that in 
particular v n (t,x,a) = Y™’ t,x,a . 

Proof. We first note that v n £ Cf,([0,T] x E x A) is the required solution if and only if 

v n {t, x, a) = g(x) + f jCv n (s,x,a) ds + f f n {s, x, a, v n (s, x, •) — v n (s, x, a)) (5.9) 

Jt Jt 
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for t £ [0,, T), x £ E, a £ A, where f n (t , x, a, if) is the map defined in (14.61) . We use a fixed point 
argument, introducing a map T from Cfc([0, T] x E x A) to itself setting v = T(w) where 

,T ,T 

v(t, x, a) = g(x) + / Cw(s, x, a) ds + / f' L (s,x,a,w(s,x,-) — w(s,x,a)) ds. 

Jt Jt 

Using the boundedness assumptions on A and Ao it can be shown by standard arguments that 
some iteration of the above map is a contraction in the space of bounded measurable real functions 
on [0 , T] x E x A endowed with the supremum norm and therefore the map F has a unique fixed 
point, which is the required solution v n . 

We finally prove the identifications (15.7I) - (J5.8D . Since v n £ Cb([0,T] x E x A) we can apply 
the Ito formula to the process v(s, X s , I s ), s £ [t, T], obtaining, P t,3: ’ a -a.s., 

v n (s,X s ,I s ) = v n {t, x,a) + J U — (r,X r ,I r ) +/l^ n (r, X r ,/ r )J dr 

+ [ f (v n (r,y,b) - v n (r,X r _,I r -)) q(drdydb), s€[t,T]. 

Jt J ExA 


Taking into account that v n satisfies (I5.5D - ()5.6I) and that ( X , I) has piecewise constant trajectories, 
we obtain P t,3: ’ a -a.s., 


dv n 

dr 


(r, X r J r ) + Cv n (r, X r ,I r ) + f n (r, X r ,I r ,v n (r, X r , •) - v n (r, X r , I r )) = 0, 


for almost all r £ [t,T], It follows that, P t,a: ’ a -a.s., 


v n (s,X s ,I s ) = v n (t,x,a) - f n (r,X r ,I r ,v n (r,X r ,-) - v n (r, X r , I r )) dr 

(v n (r,y,b) - v(r,X r -,I r _)) q(drdydb), s £ [t,T\. 



+ 

Jt J ExA 

Since v n (T,x,a) = g(x) for all (x,a) £ E x A, simple passages show that 

v n (s,X s ,I s ) = g(X T ) +J* f n {r,X r ,I r ,v n {r,X r ,-)-v n (r,X r ,I r ))dr 

- / / (v n (r,y,b)-v(r,X r _,I r _)) q(drdydb), se[t,T}. 
Jt J ExA 


Therefore the pairs (Yj l ’ t,x,a , Z™ ,t,x ’ a (y , b)) and (v n (s, X s , I s ),v n (s, y, b) — v n (s , I s -)) are both 

solutions to the same BSDE under p6 3; > a ) anc ( thus they coincide as members of the space S^ x a x 
L( Xa (q)' The required equalities (15.71) - (15.81) follow. In particular we have that v n (t,x,a ) = 

yn,t,x,a 


5.2 Convergence of the penalized solutions and conclusion of the proof 

We study the behavior of the functions v n as n —> oo. To this end we first show that they are 
bounded above by the solution to the HJB equation. 

Lemma 5.3. Assume (15.11) . (15.21) . (15.31) . Let v denote the solution to the HJB equation as 
provided by Theorem \2.6\ and v n the solution to (15.51) - (15.61) as provided in Lemma HOI Then, for 
all (t, x, a) £ [0, T] x E x A and n > 1, 

v(t, x) > v n (t, x, a). 


26 




Proof. Let v : [0, T] x E —> M be a solution to the HJB equation. As in the proof of Proposition 
m we have the identity 

g(X T ) - v(t, X t ) = [ ^-(r,X r )dr+ [ f (v(r,y) - v(r,X r _))p(dr dydb), 

Jt Or JExA 


which follows from the absolute continuity of t H > v(t,x), taking into account that X is constant 
among jump times and using the definition of the random measure p defined in (|3.2I) and the fact 
that v depends on t, x only. Since v is a solution to the HJB equation we have, for all x € E 
ci €E A, 

dv f 

— 7 ^{t, X ) > £%v(t, X ) + fit, x, a) = y (u(t, y) - v(t, x)) \{x, a, dy ) + f{t, x, a), 


almost surely on [0, T], Taking into account that {X,I) has piecewise constant trajectories we 
obtain 


g(X T )-v{t,X t ) < [ [ (v(r,y) - v(r, X r _)) p(dr dy db) (5.10) 

J(t,T\ JExA 

-[ [ (v(r,y) - v(r,X r )) \(X r ,I r ,dy) dr - f f(r,X r ,I r )dr. 

J t J E J t 


Then, for any n > 1 and v € V” let us consider the probability p(, ,a; ’ a introduced above and 
recall that under p* ,x,a the compensator of the random measure p{drdydb) is p u {drdydb ) = 
v r {b) Ao (db) d{x r _}(^y) dr + A(X r _, I r _, dy) 6^j r _y{db) dr. Noting that v(r, y) — v(r, X r J) is pre¬ 
dictable, taking the expectation in (j5.10D we obtain 


E ^ a [giX T )]-v{t,x) < _E ^ J T fir, X r ,I r ) dr. 
Since v € V n was arbitrary, and recalling (14.71) . we conclude that 


v{t,x) > sup E 

v&V 


t.x.a 


g{X T ) + £ fir, X r ,I r ) 


dr 


= v n it, x, a). 


□ 


From Lemma 15.21 we know that v n it,x,a ) = Y™' t,x ' a , and from Lemma 14.51 we know that 
v n it,x,a) is monotonically increasing and uniformly bounded. Therefore we can define 

vit,x,a) := lim u n (f, x, a), t € [0, Tj, x € E, a G A. 

n—too 

v is bounded, and from Lemma 15.31 we deduce that v < v. As an increasing limit of continuous 
functions, v is lower semi-continuous. Further properties of v are proved in the following lemma. 
In particular, (I5.1ip (or (15.121) j means that v is a supersolution to the HJB equation. 

Lemma 5.4. Assume (15.11) . (15.21) . (15.31) and let v be the increasing limit of v n . Then v does not 
depend on a, i.e. v{t,x,a) = vit,x,b) for every t € [0,T], x € E and a,b £ A. Moreover , setting 
v{t,x) = vit,x,a) we have 

vft, x) — v{t', x) > J {C a E vis, x) + /(s, x, a))ds, 0 <t<t'<T,x£E,a£ A. (5.11) 
More generally, for arbitrary Borel-measurable a : [0,T] —> A we have 

ft' 

vit, x) — vit', x) > J i£ E (s) vis, x) + fi-s, x, a(s))) ds, 0 < t < t' < T, x £ E, a £ A. (5.12) 
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Proof, v 11 satisfies the integral equation (15.911 , namely 

f T 


(t,x,a) = g{x) + / / (v n (s, y, a) — v n (s, x, a)) X(x, a, dy) ds 

Jt Je 

/ f(s,x,a)ds + n / / [v n (s,x,b) — v n (s,x,a)] + Xo(db) ds. 

Jt Jt J A 


+ 


Since v n is a bounded sequence in Cb{[0,T] x E x A) converging pointwise to v, setting t = 0, 
dividing by n and letting n —> oo we obtain 


1 0 J A 


[u(s, x , b) — v(s, x, a)] + Aq (db) ds = 0. 


(5.13) 


Next we claim that v is right-continuous in t on [0, T ), for fixed x £ E, a £ A. To prove this we 
first note that, neglecting the term with the positive part [..,] + we have 

v n (t', x, a) — v n {t, x, a) < — / (v n (s,y,a) — v n (s,x,a)) X(x,a,dy) ds — / f(s,x,a)ds 

Jt JE J t 

< C 0 (t'-t), (5.14) 


for some constant Co > 0 and for all 0 < t < t' < T and n > 1, where we have used again 
the fact that v n is uniformly bounded. Now fix t € [0, T). Since, as already noticed, v is lower 
semi-continuous we have v(t,x,a ) < lirn inf s | t v(s, x, a). The required right continuity follows if 
we can show that v(t, x, a) > lirnsup s 1 4 v(s, x, a). Suppose not. Then there exists s^ f t such that 
v(sk, x, a) tends to some limit l > v(t). It follows that v(sk,x, a) —v(t, x, a) > Co(sfc — t) for some 
k sufficiently large, and therefore also v n {sk, x, a) — v n (t, x, a) > Co(sfc — t) for some n sufficiently 
large, contradicting (15.141) . This contradiction shows that v is right-continuous in t on [0,T). 

Then it follows from (15.1311 that J A [v(t,x,b) — v(t,x,a)] + Xq( db) = 0 for every x € E, a € A, 
t £ [0, T], Therefore there exists B C A (dependent on t,x,a ) such that B is a Borel set with 
Ao (B) = 0, and 

v(t,x,a ) > v(t,x,b'), b' ^ B. (5.15) 


Since Ao has full support, B cannot contain any open ball. So given an arbitrary b £ A we 
can find a sequence b n -A b, b n B. Writing (15.1511 with b n instead of b' and using the lower 
semi-continuity of v we deduce that v(t,x,a) > liminf n v(t, x, b n ) > v(t,x,b). Since a and b were 
arbitrary we finally conclude that v(t, x, a) = v(t, x, b) for every t £ [0, T], x £ E and a,b £ A, so 
that v(t,x,a) does not depend on a and we can define v(t,x) = v(t,x,a). 

Passing to the limit as n —> oo in the first inequality of (15.1411 we immediately obtain (15.111) . 
so it remains to prove ((5.121) . Let A(v) denote the set of all Borel-measurable a : [0, T] — > A such 
that (I5.12P holds, namely for every 0 < t < t' < T, x £ E, a £ A, 


v(t, x) — v(t', x) > / / v(s,y) X(x,a(s),dy) ds (5.16) 

Jt Je 

P' _ 

— / v(s, x) X(x, a(s), E) ds + / f(s,x,a(s))ds. (5-17) 

Jt Jt 

Suppose that a n £ A(v), a : [0,T] — > A is Borel-measurable and a n (t) -A a n (t ) for almost all 
t £ [0, T], Note that 


/ v(t,y) X(x,a,dy) = lim / v n (t, y, a) X(x, a, dy) 
Je n-t-oo J E 


(5.18) 
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and the latter is an increasing limit. Since v 11 £ Cb([0,T] x E x A) and A is Feller, for any n > 1 
the functions in the right-hand side of (|5.18p are continuous in ( t,x,a) (see e.g. [3j, Proposition 
7.30) and therefore the left-hand side is a lower semicontinuous function of ( t,x,a ). It follows 
from this and the Fatou Lemma that 



v(s, y ) X(x, a(s),dy ) ds 


< 


< 


/ liminf / v(s,y) X(x,a n (s),dy) 

n ^°° Ue 

liminf / / v(s,y) X(x,a n (s),dy) ds 

n ^°° Jt Je 


ds 


Using this inequality and the continuity and boundedness of the maps a A (x,a,E), a 
f(t,x,a) we see that assuming the validity of inequality (| 5.16 jl for a n implies that it also holds 
for a, hence a € A(v). 

Next we note that A(v) contains all piecewise constant functions of the form aft) = 
a. t l[t lt t i+1 )(t) with k > 1, 0 = ti < t 2 <■■■ < t k +i = T, a* € A: indeed, it is enough to write 
down (15.111) with [t,t') = [ti,U+ 1 ) and sum up over i to get (15.121) for a(-) and therefore conclude 
that a(-) € A(v). Since we have already proved that the class A(v) is stable under almost sure 
pointwise limits it follows that A(v) contains all Borel-measurable functions a : [0, T] —> A as 
required. □ 


We are now ready to conclude the proof of our main result. 

Proof of Theorem 15.11 We will prove the inequality 

v(t,x ) > V(t,x), t £ [0, T], x £ E, (5.19) 

where v = limn^oo v n was introduced before Lemma 15.41 Since we know that v < v and, by 
Theorem 12.91 v = V it follows from ()5. 191) that v = v = V. Passing to the limit as n —>• oo in 
((5.71) and recalling (14.4p all the other equalities follow immediately. 

To prove (15.191) we fix t £ [0, T], x £ E and a Borel-measurable map a : [0, T] x E —> A, i.e. 
an element of A a d, the set of admissible control laws for the primal control problem, and denote 
by P a X the associated probability measure on (fl, J 7 ^), for the controlled system started at time t 
from point x, as in section [2721 We will prove that v(t,x) > J(t,x,a), the gain functional defined 
in ((2.51) . Recall that in 9 we had defined a canonical marked point process (T n , E n ) n > i and the 
associated random measure p. Fix u £ 9 and consider the points T n (u) lying in (t,T], which 
we rename S); thus, t < S\ < ... Sk < T, for some k (also depending on ui). Recalling that 
v(T,x) = g(x) we have 


k 

g(X T )-v(t,x) = g(X T ) -v{S k ,X Sk ) + ^[v(Si,X Si ) -v(Si,X Si -)} 

1=1 

k 

+ -u(5j_i,X Si _i)] +v{Si,X Sl -) -v(t,x). 

i=2 

P^-a.s we have Xg, - = Xs i _ 1 (2 < i < k) and X$ 1 ~ = x, so we obtain 

k 

g(X T ) - v(t, x) = g(X T ) - v(S k , X Sk ) + x Si) ~ v(Si,X Si .)] 

1=1 

k 

+ +v{Sl,x) -V(t,x). 

i =2 
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The first sum can be written as 


k 

2=1 



v(s,X s _)\p(dsdy), 


while the other can be estimated from above by repeated applications of (15.121) . taking into 
account that X is constant in the intervals (t, Si], (5j_i,5j] (2 < i < k) and (S’/ C ,T]: 


HSuXs^J-viSi-uXs^) < - 


> a(s,Xg i _ 1 ) 


vfaXs^) + /(s,X Si _ 1J a(s,X 5 i _ 1 ))) ds 


P(‘ 

JSi -1 v 

£ 1 (/^’^(s, X s ) + /(a, X s , a(a, X s ))) ds 


for 2 < i < k and similar formulae for the intervals (2, Si], and (Sfc,T], We end up with 

g(X T )-v(t,x) < / [v(s,y)-v(s,X s _)]p(dsdy) 

Jt J E 

pT 

- ^ (£“ (s ’ A ' s) u(s, X s ) + /(a, X s , a(a, X a ))) ds. 

Recalling that the compensator of the measure p under Fp is l[f i00 )(s)A(X s _, a(s,X s _), dy) ds 
we have, taking expectation, 

K X f T [ [v(s,y)-v(s,X s _)]p(dsdy)=E t f f cf' Xs) v{s, X s ) ds, 

J t J E J t 

which implies, by the previous inequality, Ep^^Xx)] — v(t,x) < —E a X Jp f(s,X 3 ,a(s,X s ))ds 
and so v(t,x) > J(t, x,a). Since a € A a d was arbitrary we conclude that v(t,x) > V(t,x). 

□ 
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